AITopics | nesterov acceleration

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is attractive in forward-mode automatic differentiation and communication-limited settings. While Nesterov acceleration is well understood for full-gradient and coordinate-based methods, obtaining accelerated methods for general subspace sketches that use only projected-gradient information and can improve over full-dimensional Nesterov acceleration in oracle complexity is technically nontrivial. We develop randomized-subspace Nesterov accelerated gradient methods for smooth convex and smooth strongly convex optimization under matrix smoothness and generic sketch moment assumptions. The key technical ingredient is a three-sequence formulation tailored to matrix smoothness, which recovers the corresponding classical Nesterov methods in the full-dimensional case. The resulting theory establishes accelerated oracle-complexity guarantees and makes explicit how matrix smoothness and the sketch distribution enter the complexity. It also provides a unified basis for comparing sketch families and identifying when randomized-subspace acceleration improves over full-dimensional Nesterov acceleration in oracle complexity.

artificial intelligence, machine learning, sketch, (17 more...)

arXiv.org Machine Learning

2605.0074

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

019f8b946a256d9357eadc5ace2c8678-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 10:10:26 GMT

artificial intelligence, machine learning, nullnull, (16 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

019f8b946a256d9357eadc5ace2c8678-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 10:10:22 GMT

artificial intelligence, machine learning, nullnull, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ec26fc2eb2b75aece19c70392dc744c2-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 18:25:54 GMT

Weintroducethe"continuized"Nesterovacceleration,aclosevariantofNesterov acceleration whose variables are indexed by a continuous time parameter.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Neural Information Processing SystemsDec-25-2025, 04:20:58 GMT

We introduce the ``continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.

continuized acceleration, deterministic and stochastic gradient descent, nesterov acceleration, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)

Add feedback

ec26fc2eb2b75aece19c70392dc744c2-Paper.pdf

Neural Information Processing SystemsAug-18-2025, 13:36:10 GMT

artificial intelligence, machine learning, nesterov acceleration, (14 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.30)

Add feedback

On the Effectiveness of the z-Transform Method in Quadratic Optimization

Bach, Francis

arXiv.org Artificial IntelligenceJul-18-2025

Characterizing the convergence of real-valued or vector-v alued sequences is a key theoretical problem in data science, where the sequence index typically correspon ds to the number of iterations of an iterative algorithm (such as in optimization and signal processing) o r the number of observations (as in statistics and machine learning). This characterization can be done in mostly two ways, asymptotically or non-asymptotically. In an asymptotic analysis, an asymptotic e quivalent of the sequence is identified, which readily allows comparisons with other algorithms; however, without further analysis, the behavior at any finite time cannot be controlled. This is exactly what non-as ymptotic analysis aims to achieve, by providing bounds that are valid even for a finite index, but then only pro viding bounds that cannot always be compared. While the two approaches have their own merits, in this paper, we focus on asymptotic analysis and sequences that tend to their limit at a sub-exponential r ate that is a power of the sequence index. The main goal of this paper is to show how a classical tool from signal processing, control theory, and electrical engineering ( Oppenheim et al., 1996), the z -transform method ( Jury, 1964), can be used in this context with a striking efficiency at obtaining asymptotic eq uivalents for the class of algorithms that can be seen as iterations of potentially random linear operators i n a Hilbert space. This includes gradient descent for quadratic optimization problems as well as its accelera ted and stochastic variants ( Nesterov, 2018), 1 Landweber iterations in inverse problems ( Benning and Burger, 2018), or gossip algorithms in distributed computing ( Boyd et al., 2006).

artificial intelligence, machine learning, sequence, (20 more...)

arXiv.org Artificial Intelligence

2507.03404

Country: Europe (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Neural Information Processing SystemsMay-27-2025, 06:17:50 GMT

We introduce the continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise.

artificial intelligence, machine learning, nesterov acceleration, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Nesterov acceleration despite very noisy gradients

Neural Information Processing SystemsMay-26-2025, 18:58:21 GMT

We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex and strongly convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient at every point. Nesterov's method converges at an accelerated rate if the constant of proportionality is below 1, while AGNES accommodates any signal-to-noise ratio. The noise model is motivated by applications in overparametrized machine learning. AGNES requires only two parameters in convex and three in strongly convex minimization tasks, improving on existing methods.

artificial intelligence, machine learning, nesterov acceleration, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Neural Information Processing SystemsJan-19-2025, 12:10:35 GMT

We introduce the continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise.

continuized acceleration, deterministic and stochastic gradient descent, nesterov acceleration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

nesterov acceleration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Randomized Subspace Nesterov Accelerated Gradient

019f8b946a256d9357eadc5ace2c8678-Supplemental.pdf

019f8b946a256d9357eadc5ace2c8678-Paper.pdf

ec26fc2eb2b75aece19c70392dc744c2-Paper.pdf

Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

ec26fc2eb2b75aece19c70392dc744c2-Paper.pdf

On the Effectiveness of the z-Transform Method in Quadratic Optimization

Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Nesterov acceleration despite very noisy gradients

Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms